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Abstract 


Vienna  Fortran  is  a  machine-independent  language  extension  of  Fortran,  which 
is  based  upon  the  Single-Program-Multiple-Data  (SPMD)  paradigm  and  allows  the 
user  to  write  programs  for  distributed-memory  systems  using  global  addresses.  The 
language  features  focus  mainly  on  the  issue  of  distributing  data  across  virtual  processor 
structures.  In  this  paper,  we  discuss  those  features  of  Vienna  Fortran  that  allow  the 
data  distributions  of  arrays  to  change  dynamically,  depending  on  runtime  conditions.  >  For 
We  discuss  the  relevant  language  features,  outline  their  implementation  and  describe 
how  they  may  be  used  in  applications.  I  djiq 
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1  Introduction 


High-level  language  extensions  to  Fortran,  which  enable  users  to  design  programs  for  mas¬ 
sively  parallel  computers  much  as  they  are  accustomed  to  on  a  sequential  machine,  have  been 
the  subject  of  intense  discussion  and  research  activity  in  recent  months.  Vienna  Fortran  [3, 4] 
is  one  of  several  prop<Muds  put  forth  for  such  a  set  of  language  extensions  [5,  6,  9,  12,  13]. 
A  number  of  features  of  Vienna  Fortran  have  since  been  adopted  by  the  High  Performance 
Fortran  Forum.  One  of  these  is  the  concept  of  static  and  dynamic  distributions  of  arrays 
in  a  program,  although  the  details  of  these  features  are  not  the  same  in  High  Performance 
Fortran  (HPF). 

The  language  extensions  provided  by  Vienna  Fortran  allow  the  user  to  explicitly  control 
and  specify  the  mapping  of  arrays  across  the  underlying  set  of  processors.  The  computation, 
however,  is  still  specified  using  a  global  address  space  which  is  independent  of  the  distribution 
of  the  data.  That  is,  the  programmer  writes  code  using  a  single  thread  of  control  just  as  when 
writing  a  sequential  program.  It  is  the  compiler’s  responsibility  to  produce  code  suitable  for 
parallel  execution. 

The  Vienna  Fortran  Compilation  System  generates  code  based  on  the  SPMD  (Single 
Program  Multiple  Data)  model,  in  which  each  processor  executes  essentially  the  same  code, 
but  on  a  local  data  set.  The  mapping  specification  provided  by  the  user  determines  the 
ownership  of  data:  a  processor  owns  the  data  which  is  distributed  to  it,  and  stores  it  in  its 
local  memory.  In  general,  the  compiler  distributes  work  based  upon  the  owner  computes 
rule:  the  processor  performs  the  computation  that  defines  data  elements  owned  locally. 
The  compiler  satisfies  any  non-local  references  required  for  this  computation  by  inserting 
commimication  statements  to  transfer  the  data. 

The  performance  of  the  generated  code  is  critically  dependent  on  the  data  distribution 
used  for  the  program.  A  distribution  is  selected  with  the  snms  of  spreading  the  workload 
as  evenly  as  possible  across  the  processors,  while  preserving  the  locality  of  computation. 
The  appropriate  distribution  for  a  given  code  will  depend  on  the  characteristics  of  both 
the  program  itself  and  that  of  the  target  architecture.  The  former  includes  factors  such  as 
the  data  access  patterns  exhibited  by  the  code,  and  the  size  of  the  data  structures  relative 
to  the  munber  of  processors  used  for  a  particular  execution.  The  hu'dware  factors  include 
the  communication  latency  and  bandwidth,  the  computation/communication  ratio,  and  the 
cache  behavior  of  the  machine. 

If  these  factors  can  be  determined  statically,  then  the  user  can  choose  the  “best”  data 
distribution  at  compile  time.  However,  in  situations  where  the  program  behavior  is  depen¬ 
dent  on  runtime  values,  the  choice  of  the  appropriate  distribution  may  be  made  at  runtime 
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if  there  is  language  support  for  dynamic  distributions.  Major  uses  of  dynamic  distribution 
of  data  in  programs  are  to: 

•  improve  the  locality  of  data  accesses  in  codes  with  identifiable  computation  phases, 

•  write  highly  portable  code  in  which  the  data  distributions  are  selected  on  the  basis  of 
input  data  and/or  characteristics  of  the  executing  machine, 

•  maintain  a  good  load  balance  throughout  the  execution  of  a  program  for  which  the 
workload  varies  significantly  during  the  computation. 

There  are  significant  costs  associated  with  using  dynamic  distribution  of  data.  At  run 
time,  this  includes  the  cost  of  performing  the  actual  data  transfers  and  the  cost  of  maintain¬ 
ing  runtime  information  about  the  current  distribution.  At  compile  time,  a  more  rigorous 
analysis  must  be  performed  to  determine  the  distributions  associated  with  a  particular  data 
reference.  In  particular,  the  compiler  has  to  generate  code  which  allows  for  the  possibility 
that  several  data  distributions  may  reach  some  statements.  Despite  these  costs,  the  judi¬ 
cious  use  of  dynamic  distribution  features  can  reduce  the  overall  communication  costs  of  the 
program  while  improving  the  load  balance.  Thus,  the  overall  performance  of  the  code  may 
improve  even  in  the  presence  of  the  runtime  overheads. 

In  this  paper,  we  present  the  language  features  of  Vienna  Fortran  which  support  dynamic 
distribution  of  data.  Section  2  describes  the  distribution  facilities  along  with  some  control 
constructs  required  for  expressing  code  in  the  presence  of  redistribution  of  data.  The  compiler 
and  runtime  support  required  for  implementing  these  features  is  discussed  in  Section  3  while 
their  usefulness  for  scientific  codes  is  considered  in  Section  4.  The  paper  concludes  with  a 
discussion  of  related  work  and  some  final  remarks. 


2  Distribution  and  Alignment  in  Vienna  Fortran 

The  Vienna  Fortran  language  extensions  include  features  for  the  specification  of  the  pro¬ 
cessors  which  execute  the  program,  the  distribution  of  arrays  to  subsets  of  processors, 
alignment  between  arrays,  flexible  mechanisms  for  the  transfer  of  arguments  to  proce¬ 
dures,  and  explicitly  parallel  asynchronous  forall  loops  [4,  16].  In  this  section,  we  focus 
only  on  the  aspects  relevant  in  the  context  of  dynamic  array  distributions. 

2.1  Basic  Notation  and  Terminology 

Each  array  A  is  associated  with  an  index  domain  which  we  denote  by  I^.  An  index 
mapping  from  an  index  domain  I  to  an  index  domain  J  is  a  total  function  t  :  1  —*  V 
(J)  —  {</>},  where  V  (J)  denotes  the  powerset  of  J. 
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A  distribution  of  an  array  maps  each  array  element  to  one  or  more  processors  which 
become  the  owners  of  the  element  and,  in  this  capacity,  store  the  element  in  their  local 
memory.  We  model  distributions  by  mappings  between  the  associated  index  domains: 

Definition  1  Let  A  denote  an  array,  and  R  a  processor  array.  An  index  mapping  6^  from 
to  is  called  a  distribution  for  A  with  respect  to  R. 

An  alignment  establishes  a  relationship  between  elements  of  different  arrays  such  that 
corresponding  elements  are  guaranteed  to  reside  in  the  same  processor: 

Definition  2  Let  A,  B  denote  arbitrary  arrays.  An  index  mapping  Og  from  1*  to  I®  is 
called  an  alignment  for  A  with  respect  to  B. 

Given  Sg,  Sg  is  determined  as  follows:  For  each  i  €  ; 

^^(i)  :=  CONSTRUCT{ai,6^)  =  Uj6„(i)«|a) 

2.2  Specification  of  Distribution  and  Alignment 

Distributions  are  specified  in  a  program  by  distribution  expressions.  E)ach  distribution 
expression,  for  example  (BLOCK, CYCLIC (K)),  determines  a  class  of  distributions  which  is 
called  a  distribution  type.  The  application  of  a  distribution  type  to  a  (data)  array  and  a 
processor  section  yields  a  distribution. 

Simple  distribution  expressions  specify  mappings  between  one  array  dimension  and  one 
processor  dimension;  they  include  the  intrinsic  distribution  functions  BLOCK,  CYCLIC, 
S^LOCK,  and  B -BLOCK.  BLOCK  distributes  one  array  dimension  to  one  processor  di¬ 
mension  in  evenly  sized  segments.  CYCLIC  maps  elements  of  an  array  dimension  in  a 
round-robin  fashion  to  a  dimension  of  the  processor  array.  SJBLOCK  and  B-BLOCK  per¬ 
mit  the  specification  of  contiguous  irregular  blocks,  introducing  the  concept  of  general  block 
distributions. 

Distribution  expressions  associated  with  multi-dimensional  arrays  may  be  specified  as  a 
list  of  simple  distribution  expressions,  each  one  corresponding  to  exactly  one  array  dimen¬ 
sion.  The  elision  symbol  in  such  a  list  prevents  the  associated  array  dimension  from 
being  distributed. 

Alignments  are  expressed  in  a  Vienna  Fortran  program  by  alignment  specifications. 
We  illustrate  their  use  -  together  with  distribution  expressions  -  by  a  simple  example: 
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Example  1  Distribution  and  Alignment 

PARAMETER  (M=:2) 

PROCESSORS  R(1:M,1:M) 

REAL  C(10,10,10)  DIST  (BLOCK, BLOCK,:)  TO  R 
REAL  D(10,10,10)  ALIGN  D(I,J,K)  WITH  C{J,I,K) 

R  denotes  a  two-dimensional  processor  array.  The  distribution  of  array  C  is  specified  by  the 
distribution  expression  (BLOCK,BLOCK,:)  which  indicates  that  the  first  two  dimensions 
are  distributed  by  BLOCK,  while  the  third  dimension  is  not  distributed.  More  precisely, 
S^{i,j,k)  =  {/2(ri],  rjl)}  for  all  jb,  1  <  ib  <  10.  The  alignment  specification  forD  transposes 
the  first  and  second  dimensions  of  C,  i.e.,  the  resulting  alignment  function  maps  each  index 
triplet  {i,j,k)  in  to  the  index  triplet  (j^ifk)  in 

2.3  Dynamically  Distributed  Arrays 

The  language  distinguishes  between  statically  and  dynamically  distributed  arrays,  depending 
on  whether  or  not  the  association  between  an  array  and  its  distribution  is  invariant  in  a  given 
scope*.  This  distinction  is  made  syntactically  in  the  declaration  of  the  array.  The  arrays 
shown  in  Example  1  were  statically  distributed. 

We  define  an  equivalence  relation,  connect,  in  the  set  of  dynamically  distributed  arrays 
within  a  given  scope.  This  relation  satisfies  the  following  conditions: 

1.  Each  equivalence  class  consists  of  one  distinguished  member,  the  primary  array,  B, 
of  the  class,  and  0  or  more  secondary  arrays.  We  denote  the  class  associated  with 
primary  array  B  by  C{B). 

2.  The  distribution  of  each  secondary  array  A  €  C{B),  if  any,  is  defined  in  the  declaration 
of  A  by  referring  to  B  in  a  secondary  array  annotation,  which  specifies  a  connection 
by  distribution  extraction  [16]  or  alignment. 

3.  Distribute  statements  are  explicitly  applied  to  primary  arrays  only;  their  effect  is  to 
redistribute  all  arrays  in  the  associated  equivalence  class  so  that  the  connection  is 
mrintained. 

4.  The  distributions  of  arrays  in  different  eqmvalence  classes  are  independent  of  each 
other. 

5.  The  connect  relation  does  not  extend  across  procedure  boundaries. 

*If  no  ambiguity  is  possible,  we  simply  refer  to  static  or  tfynamtc  distributions. 
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An  annotation  specifying  as  primary  arrays  has  the  form 

REAL  Bi(. B](. •  •)  DYNAMIC  [,(ftstn6utton-ran^e]  [, initial-distribution] 

A  distribution  range  determines  the  set  of  ail  distribution  types  (or  a  superset  thereof) 
which  can  be  associated  with  the  arrays  Bi  during  the  execution  of  the  procedure  in  which  the 
declaration  occurs.  The  distribution  range  is  specified  by  the  keyword  RANGE ,  followed 
by  a  parenthesized  list  of  distribution  expressions  (see  Section  2.2).  The  can  be  used  as 
a  “don’t  care”  symbol.  Distribute  statements  applied  to  the  Bi  must  respect  the  restrictions 
imposed  by  this  attribute. 

If  no  distribution  range  is  specified,  then  there  is  no  restriction  on  the  distributions  that 
can  be  associated  with  a  primary  array. 

An  initial  distribution  is  evaluated  and  associated  with  each  Bi  each  time  the  array  is 
allocated.  An  array  for  which  an  initial  distribution  has  not  been  specified  cannot  be  legally 
accessed  before  it  has  been  explicitly  associated  with  a  distribution  by  the  execution  of  either 
a  distribute  statement  or  a  procedure  call. 

A  secondary  array  annotation,  for  the  arrays  At,  has  the  form 

REAL  y4i(... ),..., A.(...)  DYNAMIC,  CONNECT  connection 

The  connection  can  be  either  a  distribution  extraction  [16],  or  an  alignment  specification.  In 
both  cases,  all  secondary  arrays  Aj  are  connected  to  a  primzury  array  B.  As  a  result  of  this 
declaration,  the  Aj  are  entered  into  the  equivalence  class  C{B). 

Example  2  Dynamic  array  annotations 

REAL  B1(M)  DYNAMIC 

REAL  B2(N)  DYNAMIC,  D1ST{  BLOCK) 

REAL  B3(N,N),  B4(N,N)  DYNAMIC,  RANGE  {{BLOCK, BLOCK), (*, CYCLIC)), 
k  DIST  (  BLOCK,  CYCLIC ) 

REAL  A1(N,N)  DYNAMIC,  CONNECT  (=B4) 

REAL  A2(N,N)  DYNAMIC,  CONNECT  A2(I,J)  WITH  B4(I,J) 

All  arrays  declared  here  are  dynamically  distributed;  BI  through  BA  are  primary,  Al  and 
A2  secondary  arrays.  For  BI,  no  distribution  range  and  no  initial  distribution  are  given. 
For  B2,  no  distribution  range  is  given,  and  {BLOCK)  is  specified  as  initial  distribution. 
For  BZ  and  BA,  a  distribution  range  as  well  as  an  initial  distribution  are  specified.  Al  ts 
connected  to  BA  via  distribution  extraction  while  A2  uses  an  (identity)  alignment  to  specify 
the  connection.  As  a  consequence,  C{BA)  D  {BA,Al,A2};  the  connections  specified  ensure 
that  the  distribution  type  of  Al  and  A2  will  be  always  the  same  as  that  of  BA. 
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2.4  Distribute  Statements 

A  distribute-statemeni  has  the  form 


DISTRIBUTE  B::  da  [notransfer-attrihute] 

where  B  is  an  array  name  associated  with  a  primary  array,  and  da  is  either  a  distribution 
expression,  possibly  associated  with  a  processor  section,  or  an  alignment  specification. 

The  distribute  statement  is  executed  as  follows: 

First,  a  set  NOTRANSFER  is  determined  as  the  set  of  all  names  specified  in  the  notransfer- 
attribute,  or  the  empty  set  in  the  default  case.  All  names  in 
NOTRANSFER  must  be  secondary  arrays  in  C{B). 

Secondly,  da  is  evaluated;  its  result  is  used  to  determine  a  distribution,  6^,  for  array  B. 
Thirdly,  for  each  secondary  array  A  in  C{B),  its  distribution,  S^,  is  determined  from  the 
distribution  type  associated  with  da,  I^,  and  the  connection  between  A  and  B,  as  established 
in  the  associated  secondary  array  annotation.  If  A  is  a  member  of  NOTRANSFER,  then 
only  the  access  function  for  A  is  changed  and  the  elements  of  the  array  are  not  physically 
moved. 

Example  3  Distribute  Statement 

We  refer  to  the  declarations  in  the  previous  example.  It  is  assumed  that  the  statements 
below  are  executed  unconditionally  in  the  order  of  their  appearance  in  the  text. 

DISTRIBUTE  Bl::  (BLOCK) 

K  =  expr 

DISTRIBUTE  B1,B2::(  CyCX/C(K)) 

DISTRIBUTE  B3 ( BLOCK, CYCLIC) 

DISTRIBUTE  B4 ::  (=B1,  CYCLIC (Z)) 


In  the  first  statement,  the  array  Bl  is  distributed  by  (BLOCK). 

In  the  second  statement,  Bl  and  B2  (both  of  which  are  currently  distributed  by  (BLOCK)) 
are  redistributed  as  (CYCLIC(l^),  where  V  denotes  the  value  assigned  to  the  variable  K 
in  the  assignment  K  =  expr. 

The  third  statement  redistributes  BZ  as  (BLOCK,  CYCLIC);  in  the  next  statement,  BA 
and  the  associated  secondary  arrays  A1  and  A2  are  distributed  as  (CYCLIC(hf),  CYCLIC(Z)) 
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2.5  Control  Constructs 


The  capability  to  redistribute  data  at  an  arbitrary  position  in  a  Vienna  Fortran  program, 
including  within  conditionals,  implies  that 

•  an  array  reference  in  the  program  may,  at  run-time,  be  reached  by  more  than  one 
distribution  for  the  array,  and 

•  the  compiler  may  not  be  able  to  determine  precisely  the  set  of  all  distributions  reaching 
such  a  reference,  no  matter  how  much  analysis  is  performed. 

Thus,  control-constructs  have  been  included  in  the  language  to  alleviate  the  problems 
arising  from  this  situation:  first,  they  allow  the  user  to  formulate  an  algorithm,  depending 
on  the  actual  distribution  type  of  one  or  more  arrays;  secondly,  they  provide  the  compiler 
with  information  about  the  distribution  of  arrays.  They  include  the  dcase-construct,  which 
is  modeled  after  the  Fortran  90  CASE  construct,  and  the  if-construct,  which  is  based  on  a 
generalized  form  of  logical  expressions,  and  the  related  Fortran  if  statements. 

2.5.1  The  DCASE  Construct 
The  dcase-construct  has  the  form 

SELECT  DCASE  (Aj, . . . ,  A,) 
capi,...,capm 
END  SELECT 

where 

•  r  >  1  and  all  A,-,  1  <  t  <  r,  are  array  names.  The  A,-  are  called  selectors.  At  the 
time  of  execution  of  the  dcase  construct,  each  selector  must  be  allocated  and  associated 
with  a  well-defined  distribution. 

•  m  >  1  and  each  capj,  1  <  j  <  m,  is  a  condition-action-pair,  where  the  condition  is 
either  a  query-list  or  the  keyword  DEFAULT ,  and  the  action  is  a  block.  A  block  is  a 
sequence  of  executable-statements,  including  the  statements  of  the  language  extension, 
except  for  the  distribute  statement.  None  of  the  statements  in  a  block  may  be  the 
target  of  a  branch  from  outside  of  that  block.  It  is  permissible  to  branch  to  an  end- 
select-statement  only  from  within  the  dcase  construct. 

The  dcase  construct  selects  at  most  one  of  its  constituent  blocks  for  execution.  It  is 
evaluated  as  follows: 
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1.  The  distribution  of  each  selector,  and  its  type,  are  determined. 

2,  Let  (ci,  oi),  (cj,  02), . . .  denote  the  sequence  of  condition  action  pairs  in  the  dcase  con¬ 
struct.  Then  ci,  C2, . . .  aure  sequentially  evaluated  until  either  a  j,  1  <3  <  m  is  reached 
such  that  Cj  matches,  or  no  match  occurs. 

If  Cj  matches,  then  the  2issociated  action  aj  is  executed.  This  completes  the  execution 
of  the  dcase  construct.  If  no  match  occurs,  the  execution  of  the  construct  is  completed 
without  executing  an  aiction. 

A  condition  Cj  matches  iff  either  Cj  is  the  keyword  DEFAULT ,  or  Cj  is  a  list  of  queries, 
each  of  which  matches.  Each  query  tests  the  distribution  of  one  selector  array.  Query 
lists  may  be  either  positional  or  name-tagged.  In  a  positional  query  list,  the  queries 
are  associated  with  the  selectors  >li,A2, ...  in  this  order.  In  a  name-tagged  query  list,  the 
selector  associated  with  each  query  is  explicitly  specified  by  a  name-tag.  The  order  in  which 
the  queries  occur  in  such  a  list  is  semantically  irrelevant.  A  query  list  need  not  contain  a 
query  for  every  selector.  In  such  a  case,  an  implicit  is  inserted  for  every  selector  which 
is  not  represented. 

Full  details  of  the  matching  process  are  given  in  [16]. 

Example  4  The  dcase  construct 

REAL  B1(M)  DYNAMIC 

REAL  B2(N)  DYNAMIC,  mSTi  BLOCK) 

REAL  B3(N,N),  DYNAMIC,  RANGE  {{BLOCK, BLOCK), 

k  (CYCLIC,  CYCLIC(*) ),  (*,  CYCLIC) ),  DIST  ( BLOCK,  CYCLIC ) 

«  •  • 

SELECT  DCASE  (B1,B2,B3) 

CASE  ( BL0CK),(BL0CK),(CYCLIC(2), CYCLIC) 
oi 

CASE  Bl:  (  CYCLIC),  B3:(  BLOCK,*)) 

02 

CASE  BZ:{BLOCK,CYCLIC) 

<*3 

CASE  DEFAULT 
END  SELECT 

In  the  following,  let  U  denote  the  distribution  type  associated  with  B, . 

The  first  query  list  is  positional;  it  matches  if  t\  =  <2  =  {BLOCK),  and  <3  = 
{CYC  Lie  {2),  CYCLIC). 

The  second  list  is  name-tagged;  it  matches  ift\  =  {CYCLIC),  <3  =  {BLOCK,t'),  where 
t'  is  arbitrary,  and  t^  is  any  distribution  type. 
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The  third  query  list  matches  ift^  =  (BLOCK,  CYCLIC).  <1,^2  (^re  irrelevant  in  this  case. 
Finally,  the  fourth  query  list  is  always  matched.  Thus,  if  none  of  the  first  four  query  lists 
match,  then  a^  will  be  executed. 

2.S.2  The  IF  Construct 

The  if-construct  of  Vienna  Fortran  is  based  upon  a  generalized  logical  expression,  which 
is  a  Fortran  logicaLexpression  that  in  addition  may  contain  references  to  the  intrinsic  function 
IDT.  This  function  performs  a  test  of  the  distribution  types  associated  with  their  arguments 
and,  optionally,  of  the  processor  sections  to  which  the  arguments  are  distributed;  it  yields  a 
logical  value.  For  example,  the  second  clause  in  the  dcase  construct  above  can  be  explicitly 
expressed  as 

IF  ( /£)T(B1,(  CYCLIC)))  .  AND .  ( /Z)7’(B3,(  BLOCK(*) )))  THEN 
02 

3  Implementation 

In  this  section,  we  briefly  describe,  at  an  abstract  level,  the  support  for  dynamic  data  dis¬ 
tributions  in  the  Vienna  Fortran  Compiler  System  (VFCS).  More  details  of  the  compilation 
strategy  used  in  VFCS  are  given  in  [7,  17).  Some  of  the  issues  discussed  here  are  also  being 
handled  in  other  systems  [1,  2,  8,  14]. 

The  feat.ures  required  to  manage  dynamic  data  distribution  comprise  both  compile  time 
and  run  time  elements.  Most  of  these  features  are  actually  required  to  handle  other  aspects 
of  Vienna  Fortran:  in  particular,  many  of  the  problems  posed  by  run  time  redistribution  of 
data  structures  are  the  same  as,  or  similar  to,  those  posed  by  the  redistribution  of  arrays  at 
subroutine  boundaries,  and  those  posed  by  the  fact  that  in  any  code,  several  arrays,  with 
possibly  distinct  distributions,  may  be  bound  to  the  same  formal  argument  of  a  subroutine. 

3.1  Compiler  Support 

There  are  two  major  phases  in  the  compiler:  analysis  and  code  generation.  The  most 
important  task  in  the  analysis  phase  is  solving  the  reaching  distribution  problem:  that  is, 
the  compiler  must  determine  the  range  of  distribution  types  which  may  reach  a  speciflc 
array  access  in  the  code,  by  intra-  and  inter-procedmal  analysis.  This  is  performed  both  for 
declared  (and  explicitly  distributed)  arrays  as  well  as  for  formal  subroutine  arguments.  The 
system  constructs  pairs  consisting  of  a  distribution  type  and  a  target  processor  array.  We 
call  the  set  of  all  such  pairs  which  is  valid  for  a  specific  array  at  a  speciflc  position  in  the 
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program  the  set  of  plausible  distributions.  The  information  computed  may  not  be  precise, 
since  some  of  the  distributions  may  not  actually  be  assumed  at  run  time.  If  the  full  code  is 
not  available,  the  compiler  will  have  to  rely  on  range  specifications  provided  by  the  user,  or 
make  worst  case  assumptions. 

An  extensive  communication  analysis  provides  not  only  information  on  the  communica¬ 
tion  associated  with  each  plausible  distribution  for  an  array,  but  also  the  memory  require¬ 
ments  of  the  array  under  that  distribution.  The  details  of  this  analysis  are  outside  the  scope 
of  this  paper. 

The  compiler  also  performs  a  partial  evaluation  of  distribution  queries  (both  IDT  and 
the  dcase  construct),  by  checking  whether  there  is  a  plausible  distribution  which  will  match. 

The  compiler  must  perform  many  related  taaks  during  during  code  generation.  In  par¬ 
ticular,  it  generates  code  to  create  and  maintain  data  structures  describing  the  distributions 
and  other  attributes  of  arrays,  such  as  the  associated  overlap  areas.  The  compiler  also  in¬ 
serts  calls  to  run  time  routines  to  perform  communication  as  necessary  and  to  routines  which 
perform  the  redistribution  of  data. 

3.2  Run  Time  Support 

The  run  time  support  required  may  be  described  as  the  Vienna  Fortran  Engine  (VFE), 
an  abstract  machine  that  executes  Vienna  Fortran  object  programs.  VFE  is  a  machine  at 
a  higher  level  of  abstraction  than  the  vendor-supplied  hardware/operating  system  interface. 
It  is  realized  by  a  set  of  run  time  libraries  which  provide  the  required  functionality  on  a 
specific  target  architecture.  In  particular,  these  provide  complex  data  organization  and 
access  schemes,  and  high-level  operations: 

•  The  memory  management  scheme  of  the  VFE  is  inherently  dynamic.  Even  without 
dynamic  distributions,  the  actual  allocation  of  an  array  to  the  processors’  memories 
may  not  be  known.  Redistribution  reqviir^,  in  addition,  the  possibility  of  reallocation. 

•  The  data  organization  and  access  features  provided  by  the  VFE  include: 

1.  Data  access  functions  for  Vienna  Fortran  distributions  (including  the  implementa¬ 
tion  of  irregular  accesses  via  translation  tables  and  sophisticated  buffering  schemes 
for  accesses  to  non-local  objects,  as  implemented  in  the  PARTI  routines  [15]). 

2.  An  interface  for  external  distribution  generators  and  specifiers. 

3.  Run  time  optimization  of  communication  related  to  dynamic  array  references. 
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•  A  run  time  library  of  communication  routines  for  transferring  single  array  elements 
and  array  sections,  including  specialized  routines  for  handling  reductions. 


•  Routines  to  perform  the  tasks  associated  with  DISTRIBUTE,  construct  access  func¬ 
tions,  to  modify  descriptors  associated  with  arrays  (this  information  may  be  modified 
when  the  distribution  is  changed,  or  on  entry  to  a  subroutine),  and  test  information 
stored  in  these  as  required  for  the  implementation  of  IDT  and  the  dcase  construct. 

3.2.1  Run-Time  Representation  of  Arrays 

Some  of  the  relevant  components  of  the  information  related  to  an  array  stored  locally  in 
each  processor  are  the  data  structures  and  access  functions  listed  below.  Here,  A  denotes  an 
array  name,  and  p  a  processor. 

Data  Structures: 

•  indtx.dom(A)  specifies  the  index  domain  of  A. 

•  dist{A)  characterizes  the  distribution  of  A,  which  includes  a  distribution  type,  and  a 
specification  of  the  target  processors.  For  certain  complex  distributions,  a  pointer  to 
a  translation  table  is  required^ 

•  connect-class{A)  determines  the  set  of  secondary  arrays  connected  to  a  primary  array. 

•  alignment(C)  specifies,  for  each  array  C  in  connect.class(A),  the  alignment  of  C  with 
respect  to  A. 

•  For  every  i  such  that  is  owned  by  processor  p,  loc.map^fi)  specifies  the  offset  of 
A(i)  in  the  local  memory  of  processor  p. 

•  For  regular  and  irregular  BLOCK  distributions,  segment^  specifies  the  sequence  of  the 
local  lower  and  upper  bounds  in  each  dimension. 

Access  iVinctions 

•  Access  in  processor  p  to  local  array  element  A(i)  is  performed  by  evaluating  locjmap^  (\). 

•  Access  in  processor  p  to  a  non-local  array  element  A(i)  is  performed  by  determining  a 
processor  q  owning  A(i)  from  dist(A),  and  inserting  message  passing  operations  that 
send  the  required  element  from  q  to  p. 

tPor  dummy  suguments,  the  description  may  include  a  pointer  to  another  array  representation  and/or 
sectioning  operations. 


3.2.2  Implementation  of  DISTRIBUTE 
Consider  the  statement 


DISTRIBUTE  B::  da  [notransfer-attributej 

where  da  is  a  distribution  expression  or  an  alignment  specification,  and  the  notransfer- 
attribute  determines  the  set  NOTRANSFER={Ci, . . .  ,Cm}  (see  Section  2.4).  The  realiza¬ 
tion  of  this  statement  is  handled  by  a  run-time  routine  executed  on  each  processor  which  is 
passed  the  array  and  its  current  set  of  descriptors  aud  returns  new  descriptors.  Each  pro¬ 
cessor  determines  the  new  locations  of  current  local  data,  sends  it  to  the  nev  locations,  and 
receives  data  firom  other  processors.  Data  motion  is  suppressed  where  data  flow  analysis,  or 
a  NOTRANSFER  specification,  permits. 

This  corresponds  to  executing  the  following  sequence  of  steps  on  each  processor: 

•  Step  1:  Evaluate  the  new  distribution  and  the  associated  access  functions 

1.  Evaluate  the  new  distribution:  di$t(B):=  eval(da) 

2.  Determine  the  functions  loc-map  and  segment  from  dist(B  ) 

•  Step  2:  Determine  the  distributions  of  the  arrays  connected  to  B: 

for  every  C  €  connect  jclass{B)  —  {5}  do 
dist(C):=  CONSTRUCT\aUgnment{C),  8^) 

endfor 

Here,  the  application  of  the  function  CONSTRUCT  to  the  alignment  function  associ¬ 
ated  with  C  and  the  new  distribution  of  B  yields  the  new  distribution  of  C. 

•  Step  3:  Communicate 

for  every  C  such  that  (C  €  connect jdass{B)  —  NOTRANSFER)  A  (the  previous 
distribution  of  C  is  old-dist(C))  do 
COMMUNICATE(C,old.dist(C),dist(C)) 

endfor 
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PARAMETER  (NX  =  100,  NY  =  100) 

REAL  U(NX,  NY),  F(NX,  NY)  DIST(:,  BLOCK) 

REAL  V(NX,  NY)  DYNAMIC,  RANGE ( {:,  BLOCK),  (  BLOCK, :)), 
k  DIST  (:,  BLOCK) 

CALL  RESID(  V,  U,  F,  NX,  NY) 

C  Sweep  over  x-lines 
DO  J  =  1,  NY 

CALL  TRIDIAG(  V(:,  J),  NX) 

ENDDO 

DISTRIBUTE  V  ::  (  BLOCK,  :  ) 

C  Sweep  over  y-Unes 
DO  I  =  1,  NX 

CALL  TRIDIAG(  V(I,  :),  NY) 

ENDDO 

Figure  1:  ADI  iteration  in  Vienna  Fortran 

4  Applications 

In  this  section,  we  discuss  the  benefits  of  dynamic  distribution  of  data  for  scientific  codes. 
We  present  several  examples  in  which  using  dynamic  data  distributions  allows  the  user  to 
choose  the  appropriate  data  distribution  based  on  the  rimtime  behavior  of  the  program. 

Consider  first  the  case  in  which  a  runtime  value  determines  the  choice  of  the  best  distri¬ 
bution.  For  example,  in  a  grid  based  computation,  such  as  smoothing,  the  value  at  a  grid 
point  is  based  on  its  4  nearest  neighbors.  A  colmnn  distribution  of  the  N  x  N  grid  will  give 
rise  to  2  messages  per  processor,  each  of  size  N,  per  computation  step.  On  the  other  hand, 
if  the  grid  is  distributed  by  blocks  in  two  dimensions  across  a  processor  array,  then  each 
computation  step  requires  4  messages  of  size  N/p  each  on  each  processor.  Thus,  given  the 
startup  overhead  and  cost  per  byte  of  each  message  of  the  target  machine,  the  ratio  N/p  will 
determine  the  most  appropriate  distribution.  If  the  code  has  been  written  such  that  the  size 
of  the  grid  is  an  input  parameter,  then  the  user  can  use  the  dynamic  distribution  facilities 
of  Vienna  Fortran  to  set  the  distribution  of  the  grid^ 

Another  class  of  codes  which  can  benefit  from  dynamic  distributions  are  codes  which 
exhibit  different  data  access  patterns  in  different  phases  of  the  program.  Dynamic  data 

^Vienna  Fortran  supports  an  intrinsic  function  $NP  which  returns  the  number  of  processors  being  used 
to  execute  the  program  and  can  be  used  to  compute  the  ratio  N/p. 
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distributions  can  be  used  to  control  the  locality  of  data  access  in  such  codes.  For  example, 
consider  ADI  (Alternating  Direction  Implicit)  codes  [11]  used  for  solving  partial  differential 
equations  in  computational  fluid  dynamics  and  other  areas  of  computational  physics.  The 
name  ADI  derives  from  the  faxrt  that  ^implicit”  equations,  usually  tridiagonal  systems,  are 
solved  in  both  the  x  and  y  directions  at  each  step.  In  terms  of  data  structure  access,  one 
step  of  the  algorithm  can  be  described  as  follows:  an  operation  (a  tridiagonal  solve  here) 
is  performed  independently  on  each  x-line  of  the  array  and  the  same  operation  is  then 
performed,  again  independently,  on  each  y-line  of  the  array.  The  tridiagonal  solve  has  a 
recurrence  and  thus  generates  data  dependencies  along  the  columns  in  the  first  phase  and 
along  the  rows  in  the  second  phase. 

There  are  two  broad  choices  in  such  situations  [4].  We  could  choose  a  single  distribution 
for  the  whole  program  so  that  data  accesses  are  satisfied  locally  in  one  phase  while  paying 
the  communication  costs  in  the  other  phase.  On  the  other  hand,  we  could  dynamically 
redistribute  the  data  so  that  data  accesses  in  all  phases  are  satisfied  locally. 

In  Figure  1,  we  present  a  Vienna  Fortran  code  fragment  which  employs  the  latter  strategy. 
The  tridiagonal  solves  are  performed  by  a  sequential  routine  TRIDIAG  (not  shown  here) 
which  is  given  a  right  hand  side  and  overwrites  it  with  the  solution  of  a  constant  coefficient 
tridiagonal  system.  The  array  V  is  declared  as  DYNAMIC  and  is  initially  distributed  by 
block  in  the  second  dimension.  Thus,  in  the  first  loop  which  performs  the  sweep  over  columns 
(representing  x-lines),  each  column  is  local  to  a  processor  and  causes  no  communication. 
The  array  is  then  explicitly  remapped  to  be  distributed  by  block  in  the  first  dimension.  This 
allows  the  second  loop,  a  sweep  over  y-lines,  to  also  be  executed  without  any  communication. 
Thus,  all  the  communication  is  confined  to  the  redistribution  operation,  with  only  local 
accesses  during  the  computation. 

If  the  array  is  not  explicitly  redistributed  between  the  two  loops,  then  the  argument  to 
the  second  call  to  TRIDIAG  is  distributed  across  a  set  of  processors  and  it  becomes  the 
responsibility  of  the  compiler  to  embed  the  required  communication  in  the  generated  code. 
The  eflSciency  of  the  resulting  code  will  depend  on  various  factors  including,  in  particular,  the 
analysis  capabilities  of  the  compiler.  The  dynamic  distribution  facilities  of  Vienna  Fortran 
make  it  easy  for  the  user  to  restrict  the  communication  to  the  redistribution  operation  which, 
at  least  in  the  above  code,  can  be  implemented  by  an  efficient  pre-compiled  routine. 

For  the  examples  given  above  it  is  possible  to  write  the  code  without  using  explicit 
redistribution  statements.  For  example,  one  could  declare  two  or  more  arrays  with  different 
static  distribution  and  use  array  assignments  to  produce  the  effect  of  redistribution.  This 
approach,  clearly,  wastes  storage  space  since  only  one  of  the  arrays  would  be  fruitfully  used 
in  any  single  computation  phase. 
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Another  approach  is  to  use  procedure  boundaries  for  implicit  redistribution  of  data. 
Vienna  Fortran  allows  procedure  arguments  to  be  declared  with  a  specific  distribution.  When 
the  procedure  is  called,  it  is  the  compiler’s  responsibility  to  redistribute  the  actual  argument 
to  match  the  specified  distribution.  Thus,  the  ADI  example  could  be  rewritten  such  that  it 
calls  a  different  subroutine  in  the  second  loop,  one  which  specifically  declares  its  argument  to 
be  distributed  by  block  in  the  first  dimension.  Similarly,  the  grid  example  could  be  written 
such  that  a  different  subroutine  is  called,  depending  on  the  ratio  of  the  size  of  the  grid  and 
the  number  of  executing  processors.  The  problem,  however,  is  that  this  approach  may  lead 
to  an  explosion  of  subroutines  which  are  different  only  in  the  distribution  specified  for  their 
arguments. 

Another  problem  with  using  either  assignment  or  procedure  boundaries  for  implicit  redis¬ 
tribution  is  that  the  approaches  are  particularly  awkward  and  cumbersome  to  use  if  there  is 
an  outer  iterative  loop  around  the  phases  requiring  redistribution.  Further,  it  is  not  always 
feasible  to  write  a  program  such  that  distributions  change  only  at  procedure  boundaries. 
For  example,  in  applications  such  as  adaptive  mesh  codes  or  particle-in-cell  (PIC)  codes,  the 
work  distribution  changes  as  the  computation  progresses.  In  such  codes,  the  data  needs  to 
be  redistributed  dynamically  in  order  to  rebalance  the  workload. 

Consider  a  simulation  code  based  on  the  particle-in-cell  method,  which  can  be  used  to 
study  the  motion  of  particles  in  a  given  domain,  such  as  plasmas  for  controlled  nuclear 
fusion,  or  stars  and  galaxies.  The  computation  at  each  time  step  can  be  divided  into  two 
phases.  In  the  first  phase,  a  global  force  field  is  computed  using  the  current  position  of 
particles.  In  the  second  phase,  given  the  new  global  force  field,  new  positions  of  the  particles 
are  computed.  The  program  can  be  structured  by  dividing  the  underlying  dommn  into  cells 
with  each  cell  owning  a  set  of  particles.  The  particles  move  from  one  cell  to  another  as  they 
change  p^  Jt-ions  across  the  domain.  Since  the  computation  in  each  cell  is  dependent  on  the 
number  of  particles  in  the  cell,  the  workload  across  the  domain  changes  as  the  computation 
progresses. 

Figure  2  shows  the  outermost  level  of  a  simplified  version  of  a  PIC  code  as  expressed  in 
Vienna  Fortran.  The  code  omits  details  irrelevant  to  the  discussion  here.  In  this  code,  the 
cells  are  represented  by  the  first  dimension  of  the  array  FIELD.  There  are  a  maximum  of 
NCELL  cells  and  each  cell  is  restricted  to  have  a  maximtun  of  NPART  particles. 

The  maun  goal  here  is  to  distribute  the  cells  across  the  processors  such  that  the  work  per 
processor  is  approximately  equal.  In  this  code,  we  use  the  generalized  block  distribution  to 
distribute  the  cells  in  irregular  (but  contiguous)  blocks  to  the  processors.  The  block  sizes 
(i.e.,  the  munber  of  contiguous  cells)  are  selected  so  that  each  processor  has  rotighly  the 
same  number  of  particles  on  its  local  part  of  the  domain. 
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PARAMETER  (NCELL  =  . . . ,  NPART  =  . . . ) 

INTEGER  BOUNDS($NP) 

REAL  FIELD(NCELL,  NPART, ...)  DYNAMIC,  DIST(  BLOCK,  :) 


C  Compute  inititd  position  of  particles 

CALL  uutpo8(FIELD,  NCELL,  NPART, ...) 

C  Compute  initial  partition  of  cells 

CALL  baIance(BOUNDS,  FIELD,  NCELL,  NPART, ...) 
DISTRIBUTE  FIELD  ::  BJLOCK  (BOUNDS) 

DO  k  =  1,  MAX.TIME 
C  Compute  new  field 

CALL  updateJield(FIELD,  NCELL,  NPART,  ...) 

C  Compute  new  particle  positions  and  reassign  them 

CALL  update.part(FIELD,  NCELL,  NPART, ...) 

C  Rebalance  every  10th  iteration  if  necessary 

IF  (  MOD  Ot, 10)  .EQ.  0  .AND.  rebalanceO  )  THEN 
CALL  balance(BOUNDS,  FIELD,  NCELL,  NPART, ...) 
DISTRIBUTE  FIELD  BJBLOCK  (BOUNDS) 
ENDIF 

ENDDO 


Figure  2:  High  level  PIC  code  in  Vienna  Fortran 

The  array  FIELD  is  declared  to  be  DYNAMIC  with  the  first  dimension  initi2Jly  dis¬ 
tributed  into  regular  blocks.  The  procedure  initpos  determines  the  initial  position  of  the 
particles  and  places  them  in  the  appropriate  cells.  Using  the  number  of  particles  in  each  cell, 
the  procedure  balance  computes  the  block  sizes  to  be  assigned  to  each  processor.  It  stores 
these  in  the  array  BOUNDS,  which  is  then  used  to  redistribute  the  array  FIELD  via  the  the 
intrinsic  distribution  function  B.BLOCK). 

In  each  time  step  (represented  by  one  iteration  of  the  outer  loop),  the  procedure  up- 
date^field  computes  the  new  force  field  based  on  the  current  particle  positions.  Then,  the 
procedure  updatejpart  is  called  to  update  the  positions  of  the  particles.  Based  on  the  new 
positions,  the  new  owner  cell  for  each  particle  is  determined.  If  a  particle  has  moved  from 
one  cell  to  another,  it  is  explicitly  reassigned.  This  obviously  requires  communication  if  the 
new  cell  is  on  a  different  processor.  Since  this  communication  is  based  on  the  locations  of 
the  current  and  the  new  cell,  it  is  highly  irregular  in  nature.  Thus,  the  compiler  will  have 
to  generate  runtime  code  using  the  inspector/executor  paradigm  [10,  15]  to  support  this 
particle  motion. 
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If  the  number  of  particles  on  each  processor  remains  roughly  equal  for  the  duration  of 
the  simulation,  then  load  balance  will  be  maintuned.  Some  problems  of  this  kind  display 
sufficient  uniformity  such  that  a  simple  block  distribution  will  suffice  to  provide  a  reasonable 
load  balance.  For  other,  problems,  the  motion  of  particles  during  the  simulation  may  lead 
to  a  severe  load  imbalance.  The  code,  as  shown  here,  checks  on  every  10th  iteration  (by 
calling  function  rebclance)  whether  rebalancing  is  required.  If  so,  a  new  BOUNDS  2U'ray  is 
computed  and  the  cells  redistributed  to  balance  the  workload. 

The  redistribution  needed  for  such  load  balancing  is  based  on  the  current  values  of  some 
data  structure,  for  example,  in  the  above  case  it  is  based  on  the  number  of  particles  per 
cell.  Thus,  this  kind  of  redistribution  cannot  be  expressed  using  either  array  assignment  or 
procedure  boundaries  and  requires  language  support  for  dynamic  distributions. 

5  Related  Work 

Kali  [12]  was  the  first  language  to  introduce  dynamic  data  distribution  in  a  data  parallel 
language  aimed  at  distributed  memory  machines.  It  provided  indirect  mapping  and  user 
defined  distribution  functions  which  could  depend  on  runtime  values.  A  distribute  statement 
allowed  the  user  to  dynamically  change  the  distribution  of  an  array  at  runtime.  The  design 
of  Kali  has  greatly  influenced  the  development  of  Vienna  Fortran. 

The  DINO  language,  which  extends  C  by  constructs  for  specifying  virtual  processors  to 
which  data  may  be  mapped,  and  whose  compiler  is  targeted  to  distributed  memory  com¬ 
puters,  supports  redistribution  of  data  at  procedure  boundaries,  but  does  not  extend  these 
mechanisms  to  handle  other  forms  of  user-specified  run-time  distribution  ([14]). 

An  executable  DISTRIBUTE  statement  which  performed  run-time  redistribution  of  ar¬ 
rays  was  formulated  by  Marc  Baber  and  implemented  in  his  Hyperiasking  compiler  for  block 
distributions  of  arrays;  the  system  attempted  to  optimize  the  communication  required  for 
redistribution.  This  system  did  not  permit  procedure  calls  with  distributed  data.  It  has 
been  implemented  on  the  Intel  iPSC  hypercubes  [2]. 

The  Fortran  D  language  proposal  [6]  suggests  a  set  of  features  for  enabling  the  portable 
specification  of  code  to  nm  on  a  variety  of  parallel  architectures,  including  a  dynamic  DIS¬ 
TRIBUTE  statement.  Fortran  D  does  not,  however,  provide  a  means  for  static  distribution 
of  arrays,  and  does  not  include  any  additional  constructs  which  might  enable  the  user  to 
control  or  structure  the  use  of  dynamic  distributions.  As  far  as  we  are  aware,  the  Fortran  D 
implementation  does  not  yet  provide  for  dynamic  data  distributions. 

The  High  Performance  Fortran  proposal  [9]  includes  static  and  dynamic  distributions  in 
much  the  same  way  that  Vienna  Fortran  does  and  has  included  a  small  set  of  distribution 
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queries  in  the  language  constructs.  It  has  REALIGN  and  REDISTRIBUTE  directives  to 
permit  independent  redistribution  and  realignment  of  arrays  during  execution.  These  are 
both  subsumed  by  the  DISTRIBUTE  statement  in  Vienna  Fortran.  In  contrast  to  Vienna 
Fortran,  if  an  array  is  redistributed  in  a  procedure,  HPF  does  not  permit  the  new  distribution 
to  be  returned  to  the  calling  procedure. 

6  Conclusions 

Dynamic  data  distributions  are  essential  for  a  variety  of  real  applications,  which  are  char¬ 
acterized  by  large  variations  in  the  size  or  structure  of  input  data  sets,  the  need  to  perform 
dynamic  load  balancing,  or  the  necessity  to  execute  the  code  on  several  different  architec¬ 
tures  or  different  configurations  of  one  machine.  In  all  these  cases,  the  decision  on  how  to 
map  the  data  arrays  to  the  executing  processors  might  have  to  be  deferred  until  run  time. 

However,  the  deferment  of  such  decisions  makes  it  difficult  for  the  compiler  to  generate 
efficient  code.  This  problem  can  be  alleviated  by  a  combination  of  enhanced  language  sup¬ 
port,  extensive  intra-  and  inter-procedural  compiler  analysis,  and  careful  structuring  of  the 
program  by  the  user  so  that  in  all  critical  code  sections  the  distribution  is  known  at  compile 
time. 
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